Supervisory Control Theory for LLM Revision
Wangfan Li (Clemson University), Carlos Toxtli (Clemson University)
Engineering & Operations Architectural Patterns & Composition
PLSA is a structured prompting framework that applies Supervisory Control Theory—a cognitive model of human oversight of automated systems—to guide LLM iterative self-revision. In large-scale evaluation across ML conference paper revision tasks, SCT-structured prompts produce revisions with significantly higher fidelity than matched standard self-refinement baselines.
Presentation
Talk
Paper Session 6: Learning & Control
Thursday, May 28 · 4:10 PM – 4:20 PM
Bayshore Ballroom
Poster
Thursday, May 28 · 4:30 PM – 6:00 PM
Carmel
Abstract
Iterative self-refinement is the dominant paradigm for improving LLM outputs without retraining, yet it lacks principled grounding for what to refine or in what order. We propose Prompt-Level Supervisory Alignment (PLSA), a framework that operationalizes Supervisory Control Theory (SCT), a cognitive framework for human oversight of automated systems, as a structured prompting strategy, and empirically evaluate whether theoretically-grounded prompt structure yields higher revision fidelity than matched iterative self-refinement. In a large-scale evaluation across ten venue-year combinations from three ML conference series (ICLR 2021-2025, NeurIPS 2021-2022 and 2024, CoRL 2021 and 2024), SCT-structured conditions produce revisions with significantly higher fidelity to actual author revisions than both a single-pass baseline and a matched two-pass self-refinement baseline that uses identical review information without SCT structure (all p < .001, medium-to-large effect sizes). All conditions maintain practically equivalent LLM-judge quality, and cross-model evaluation with Google Gemini 2.5 Flash-Lite corroborates condition rankings, confirming findings are not artifacts of generator self-preference. These results provide empirical evidence that theoretically-grounded prompt structure, not merely iterative refinement, is the operative variable driving higher revision fidelity.